|By Dan Smith||
|August 16, 2012 06:00 AM EDT||
Customer engagement has long benefited from data and analytics. Knowing more about each of your customers, their attributes, preferences, behaviors and patterns, is essential to fostering meaningful engagement with them. As technologies advance, and more of people's lives are lived online, more and more data about customers is captured and made available. At face value, this is good; more data means better analytics, which means better understanding of customers and therefore more meaningful engagement. However, volumes of data measured in terabytes, petabytes, and beyond are so big they have spawned the terms "Big Data" and "Big Analytics." At this scale, there are practical considerations that must be understood to successfully reap the benefits for customer engagement. This article will explore some of these considerations and provide some suggestions on how to address them.
Customer Data Management (CDM), also known as Customer Data Integration (CDI), is foundational for a Customer Intelligence (CI) or Customer Engagement (CE) system. CDM is rooted in the principles of Master Data Management (MDM), which includes the following:
- Acquisition and ingestion of multiple, disparate sources, both online and offline, of customer and prospect data
- Change Data Capture (CDC)
- Data cleansing, parsing, and standardization
- Entity Modeling
- Entity relationship and hierarchy management
- Entity matching, identity resolution, and persistent key management for key individual, household, company/institution/location entities
- Rules-based attribute mastering, "Survivorship" or "Build the Best Record"
- Data lineage, version history, audit, aging, and expiration
It's useful to first make the distinction between attributive and behavioral data. Attributive data, often referred to as profile data, is discrete fields that describe an entity such as an individual's name, address, age, eye color, and income. Behavioral data is a series of events that describe an entity's behavior over time, such as phone calls, web page visits, and financial transactions. Admittedly, there is a slippery slope between the two; a customer's current account balance can be either an attribute or an aggregation of behavioral transactions.
MDM typically focuses on attributive data. Being based on MDM, the same is true for CDM. Personally Identifying Information (PII) such as name, email, address, phone, and username are the primary drivers behind identity resolution. Other attributes such as income, number of children, or gender are attributes that are commonly "mastered" for each of the resolved entities (individual, household, company).
Enter Big Data. As more devices are developed - and adopted - that capture and store data, huge quantities of data are generated. Big Data, by definition, is almost always event-oriented and temporal, and the subset of Big Data that is relevant to a CE system is almost always behavioral in nature (clicks, calls, downloads, purchases, emails, texts, tweets, Facebook posts). Behavioral data is critical to understanding customers (and prospects). And, understanding customers is critical for establishing meaningful and welcome engagement with them. Therefore, Big Data is, or should be, viewed as an invaluable asset to any CE system.
Further, this sort of rich, temporal behavioral data is ripe for analytics. In fact, the term Big Analytics has emerged as a result. Big Analytics can be defined as the ability to execute analytics on Big Data. However, there are some real challenges involved in executing analytics on Big Data, challenges that drive the need for specialized technologies such as Hadoop or Netezza (or both). These technologies must support Massively Parallel Processing (MPP) and, just as importantly if not more so, they must bring the analytics to the data instead of bringing the data to the analytics. Having recently completed a course for Hadoop developers (an excellent course that I highly recommend), I have a heightened appreciation for the challenges related to managing and analyzing data "at scale" and the need for specialized technologies that support Big Data and Big Analytics.
A few significant points regarding Big Analytics should be considered:
- Big Analytics allow the build of models on an entire data set, rather than just a sampling or an aggregation. My colleague, Jack McCush, explains: "When building models on a small subset and then validating them against a larger set to make sure the assumptions hold, you can miss the ability to predict rare events. And often those rare events are the ones that drive profit."
- Big Analytics allow the build of non-traditional models, for example, social graphs and influencer analytics. Several useful and inherently big sources of data such as Call Detail Records (CDRs) generated from mobile/smart phones and web clickstream data both lend themselves well to these models.
- Big Analytics can take even traditional analytics to the next level. Big Analytics allows the execution of traditional correlation and clustering models in a fraction of the time, even with billions of records and hundreds of variables. As Revolution Analytics points out in Advanced 'Big Data' Analytics with R and Hadoop, "Research suggests that a simple algorithm with a large volume of data is more accurate than a sophisticated algorithm with little data. The algorithm is not the competitive advantage; the ability to apply it to huge amounts of data-without compromising performance-generates the competitive advantage."
Big Data is great for a CE system. It paints a rich behavioral picture of customers and prospects and takes CE-enabling analytics to the next level. But what happens when this massive behavioral data is thrown at a CDM/MDM system that is optimized for attributive data? A "basketball through the garden hose" effect might occur. But this doesn't have to happen; there are ways to gracefully extend CDM to manage Big Data.
The key is data classification. Attributive, or profile, data is classified separately from behavioral data. While both contain Source Native Key (e.g., cookie-based visitor id, cell phone number, device id, account number), attributive data can be structured only. Behavioral data, on the other hand, can be structured and unstructured and contains no PII. Big Data almost always falls under the behavioral category.
Importantly, behavioral data requires different processing than attributive data. Since the processing is different, the two streams can be separated just after ingestion, like a fork in the road, with the attributive data going one way and the behavioral data going the other. This is the key to integrating Big Data into a CDM-MDM system without grinding it to a halt. To be fair, the two streams aren't completely independent. The behavioral stream will typically require two things from the attributive stream: Dimension Tables and Master ID-to-Natural Key Cross-References - both of which can be considered as reference data.
For example, the "subscriber" dimension table may be required in the Big Data world so that it can be joined to the "web clicks" table. This is done in order to aggregate web clicks by subscriber gender, which only exists in the subscriber table.
Master ID-to-Natural Key Cross-References
Master IDs are created and managed in the CDM-MDM world, but they are often needed for linkage and aggregation in the Big Data world. Shadowing cross-references that map master IDs, such as master individual id, to "source natural keys" into the Big Data world solves this problem.
The two classifications of data are separated into two streams and processed (mostly) independently. How do they come back together? One way this architecture works is that both streams, attributive and behavioral, contain a "source natural key." This is a unique identifier that relates the two streams. For example, web clickstream data typically has an IP address or a web application-managed, cookie-based visitor ID. Transactional data typically has an account number. Mobile data will have a phone number or device ID. These identifiers don't have to mean anything, per se, but are critical for stitching the two streams back together.
It's not just the dimensionalized, aggregated data that is reunited with the profile data, but also the high-value, behavioral analytics attributes (predictive scores, micro-segmentations, etc.) created courtesy of Big Analytics. The attributive data is now greatly enriched by the output of the Big Data processing stream. And, to get things really crazy, these enriched behavioral analytics profile attributes can be used as part of the next cycle of matching; similar, complex behavior patterns can help tip the scales, causing two entities to match that might not have matched otherwise. In the end, CDM-MDM and Big Data can live together harmoniously; Big Data doesn't replace CDM-MDM, but rather extends it.
Continuous testing helps bridge the gap between developing quickly and maintaining high quality products. But to implement continuous testing, CTOs must take a strategic approach to building a testing infrastructure and toolset that empowers their team to move fast. Download our guide to laying the groundwork for a scalable continuous testing strategy.
Jul. 23, 2016 05:00 AM EDT Reads: 1,770
As companies gain momentum, the need to maintain high quality products can outstrip their development team’s bandwidth for QA. Building out a large QA team (whether in-house or outsourced) can slow down development and significantly increases costs. This eBook takes QA profiles from 5 companies who successfully scaled up production without building a large QA team and includes: What to consider when choosing CI/CD tools How culture and communication can make or break implementation
Jul. 23, 2016 05:00 AM EDT Reads: 1,468
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
Jul. 23, 2016 04:30 AM EDT Reads: 1,898
"We formed Formation several years ago to really address the need for bring complete modernization and software-defined storage to the more classic private cloud marketplace," stated Mark Lewis, Chairman and CEO of Formation Data Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 23, 2016 04:30 AM EDT Reads: 1,375
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, discussed how research has demonstrated the value of Machine Learning in delivering next generation analytics to imp...
Jul. 23, 2016 04:00 AM EDT Reads: 2,327
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
Jul. 23, 2016 03:45 AM EDT Reads: 1,569
Most organizations prioritize data security only after their data has already been compromised. Proactive prevention is important, but how can you accomplish that on a small budget? Learn how the cloud, combined with a defense and in-depth approach, creates efficiencies by transferring and assigning risk. Security requires a multi-defense approach, and an in-house team may only be able to cherry pick from the essential components. In his session at 19th Cloud Expo, Vlad Friedman, CEO/Founder o...
Jul. 23, 2016 03:45 AM EDT Reads: 1,728
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shannak, Senior Development Manager, Cloud Data Services, IBM, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 23, 2016 03:30 AM EDT Reads: 1,055
With over 720 million Internet users and 40–50% CAGR, the Chinese Cloud Computing market has been booming. When talking about cloud computing, what are the Chinese users of cloud thinking about? What is the most powerful force that can push them to make the buying decision? How to tap into them? In his session at 18th Cloud Expo, Yu Hao, CEO and co-founder of SpeedyCloud, answered these questions and discussed the results of SpeedyCloud’s survey.
Jul. 23, 2016 02:45 AM EDT Reads: 693
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
Jul. 23, 2016 02:45 AM EDT Reads: 2,022
With the proliferation of both SQL and NoSQL databases, organizations can now target specific fit-for-purpose database tools for their different application needs regarding scalability, ease of use, ACID support, etc. Platform as a Service offerings make this even easier now, enabling developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right ...
Jul. 23, 2016 02:30 AM EDT Reads: 728
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develo...
Jul. 23, 2016 01:15 AM EDT Reads: 1,003
SYS-CON Events announced today that MangoApps will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device.
Jul. 23, 2016 01:00 AM EDT Reads: 1,141
Redis is not only the fastest database, but it is the most popular among the new wave of databases running in containers. Redis speeds up just about every data interaction between your users or operational systems. In his session at 19th Cloud Expo, Dave Nielsen, Developer Advocate, Redis Labs, will share the functions and data structures used to solve everyday use cases that are driving Redis' popularity.
Jul. 23, 2016 12:30 AM EDT Reads: 1,409
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 23, 2016 12:00 AM EDT Reads: 1,518
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
Jul. 22, 2016 11:45 PM EDT Reads: 668
"When you think about the data center today, there's constant evolution, The evolution of the data center and the needs of the consumer of technology change, and they change constantly," stated Matt Kalmenson, VP of Sales, Service and Cloud Providers at Veeam Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Jul. 22, 2016 11:30 PM EDT Reads: 1,164
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Jul. 22, 2016 11:15 PM EDT Reads: 959
Early adopters of IoT viewed it mainly as a different term for machine-to-machine connectivity or M2M. This is understandable since a prerequisite for any IoT solution is the ability to collect and aggregate device data, which is most often presented in a dashboard. The problem is that viewing data in a dashboard requires a human to interpret the results and take manual action, which doesn’t scale to the needs of IoT.
Jul. 22, 2016 11:00 PM EDT Reads: 1,785
As organizations shift towards IT-as-a-service models, the need for managing and protecting data residing across physical, virtual, and now cloud environments grows with it. Commvault can ensure protection, access and E-Discovery of your data – whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise. In his general session at 18th Cloud Expo, Randy De Meno, Chief Technologist - Windows Products and Microsoft Part...
Jul. 22, 2016 10:45 PM EDT Reads: 1,773