Click here to close now.

Welcome!

AJAX & REA Authors: Elizabeth White, Liz McMillan, Pat Romanski, Hovhannes Avoyan, XebiaLabs Blog

Related Topics: Cloud Expo, Microservices Journal, .NET, Virtualization, AJAX & REA, Web 2.0

Cloud Expo: Article

Applying Big Data and Big Analytics to Customer Engagement

Practical considerations

Customer engagement has long benefited from data and analytics. Knowing more about each of your customers, their attributes, preferences, behaviors and patterns, is essential to fostering meaningful engagement with them. As technologies advance, and more of people's lives are lived online, more and more data about customers is captured and made available. At face value, this is good; more data means better analytics, which means better understanding of customers and therefore more meaningful engagement. However, volumes of data measured in terabytes, petabytes, and beyond are so big they have spawned the terms "Big Data" and "Big Analytics." At this scale, there are practical considerations that must be understood to successfully reap the benefits for customer engagement. This article will explore some of these considerations and provide some suggestions on how to address them.

Customer Data Management (CDM), also known as Customer Data Integration (CDI), is foundational for a Customer Intelligence (CI) or Customer Engagement (CE) system. CDM is rooted in the principles of Master Data Management (MDM), which includes the following:

  • Acquisition and ingestion of multiple, disparate sources, both online and offline, of customer and prospect data
  • Change Data Capture (CDC)
  • Data cleansing, parsing, and standardization
  • Entity Modeling
  • Entity relationship and hierarchy management
  • Entity matching, identity resolution, and persistent key management for key individual, household, company/institution/location entities
  • Rules-based attribute mastering, "Survivorship" or "Build the Best Record"
  • Data lineage, version history, audit, aging, and expiration

It's useful to first make the distinction between attributive and behavioral data. Attributive data, often referred to as profile data, is discrete fields that describe an entity such as an individual's name, address, age, eye color, and income. Behavioral data is a series of events that describe an entity's behavior over time, such as phone calls, web page visits, and financial transactions. Admittedly, there is a slippery slope between the two; a customer's current account balance can be either an attribute or an aggregation of behavioral transactions.

MDM typically focuses on attributive data. Being based on MDM, the same is true for CDM. Personally Identifying Information (PII) such as name, email, address, phone, and username are the primary drivers behind identity resolution. Other attributes such as income, number of children, or gender are attributes that are commonly "mastered" for each of the resolved entities (individual, household, company).

Enter Big Data. As more devices are developed - and adopted - that capture and store data, huge quantities of data are generated. Big Data, by definition, is almost always event-oriented and temporal, and the subset of Big Data that is relevant to a CE system is almost always behavioral in nature (clicks, calls, downloads, purchases, emails, texts, tweets, Facebook posts). Behavioral data is critical to understanding customers (and prospects). And, understanding customers is critical for establishing meaningful and welcome engagement with them. Therefore, Big Data is, or should be, viewed as an invaluable asset to any CE system.

Further, this sort of rich, temporal behavioral data is ripe for analytics. In fact, the term Big Analytics has emerged as a result. Big Analytics can be defined as the ability to execute analytics on Big Data. However, there are some real challenges involved in executing analytics on Big Data, challenges that drive the need for specialized technologies such as Hadoop or Netezza (or both). These technologies must support Massively Parallel Processing (MPP) and, just as importantly if not more so, they must bring the analytics to the data instead of bringing the data to the analytics. Having recently completed a course for Hadoop developers (an excellent course that I highly recommend), I have a heightened appreciation for the challenges related to managing and analyzing data "at scale" and the need for specialized technologies that support Big Data and Big Analytics.

A few significant points regarding Big Analytics should be considered:

  1. Big Analytics allow the build of models on an entire data set, rather than just a sampling or an aggregation. My colleague, Jack McCush, explains: "When building models on a small subset and then validating them against a larger set to make sure the assumptions hold, you can miss the ability to predict rare events. And often those rare events are the ones that drive profit."
  2. Big Analytics allow the build of non-traditional models, for example, social graphs and influencer analytics. Several useful and inherently big sources of data such as Call Detail Records (CDRs) generated from mobile/smart phones and web clickstream data both lend themselves well to these models.
  3. Big Analytics can take even traditional analytics to the next level. Big Analytics allows the execution of traditional correlation and clustering models in a fraction of the time, even with billions of records and hundreds of variables. As Revolution Analytics points out in Advanced 'Big Data' Analytics with R and Hadoop, "Research suggests that a simple algorithm with a large volume of data is more accurate than a sophisticated algorithm with little data. The algorithm is not the competitive advantage; the ability to apply it to huge amounts of data-without compromising performance-generates the competitive advantage."

Big Data is great for a CE system. It paints a rich behavioral picture of customers and prospects and takes CE-enabling analytics to the next level. But what happens when this massive behavioral data is thrown at a CDM/MDM system that is optimized for attributive data? A "basketball through the garden hose" effect might occur. But this doesn't have to happen; there are ways to gracefully extend CDM to manage Big Data.

The key is data classification. Attributive, or profile, data is classified separately from behavioral data. While both contain Source Native Key (e.g., cookie-based visitor id, cell phone number, device id, account number), attributive data can be structured only. Behavioral data, on the other hand, can be structured and unstructured and contains no PII. Big Data almost always falls under the behavioral category.

Importantly, behavioral data requires different processing than attributive data. Since the processing is different, the two streams can be separated just after ingestion, like a fork in the road, with the attributive data going one way and the behavioral data going the other. This is the key to integrating Big Data into a CDM-MDM system without grinding it to a halt. To be fair, the two streams aren't completely independent. The behavioral stream will typically require two things from the attributive stream: Dimension Tables and Master ID-to-Natural Key Cross-References - both of which can be considered as reference data.

Dimension Tables
For example, the "subscriber" dimension table may be required in the Big Data world so that it can be joined to the "web clicks" table. This is done in order to aggregate web clicks by subscriber gender, which only exists in the subscriber table.

Master ID-to-Natural Key Cross-References
Master IDs are created and managed in the CDM-MDM world, but they are often needed for linkage and aggregation in the Big Data world. Shadowing cross-references that map master IDs, such as master individual id, to "source natural keys" into the Big Data world solves this problem.

The two classifications of data are separated into two streams and processed (mostly) independently. How do they come back together? One way this architecture works is that both streams, attributive and behavioral, contain a "source natural key." This is a unique identifier that relates the two streams. For example, web clickstream data typically has an IP address or a web application-managed, cookie-based visitor ID. Transactional data typically has an account number. Mobile data will have a phone number or device ID. These identifiers don't have to mean anything, per se, but are critical for stitching the two streams back together.

It's not just the dimensionalized, aggregated data that is reunited with the profile data, but also the high-value, behavioral analytics attributes (predictive scores, micro-segmentations, etc.) created courtesy of Big Analytics. The attributive data is now greatly enriched by the output of the Big Data processing stream. And, to get things really crazy, these enriched behavioral analytics profile attributes can be used as part of the next cycle of matching; similar, complex behavior patterns can help tip the scales, causing two entities to match that might not have matched otherwise. In the end, CDM-MDM and Big Data can live together harmoniously; Big Data doesn't replace CDM-MDM, but rather extends it.

More Stories By Dan Smith

Dan Smith is a seasoned technical architect with 25 years of experience designing, developing and delivering innovative software and hardware systems.

In his role as Chief Architect at Quaero, Dan is responsible for the architectural integrity of Quaero's Intelligent Engagement platform, focusing on the capability, flexibility, scalability and fitness of purpose of the platform for Quaero's Customer Engagement hosted solutions. Dan's current focus is on development of the Quaero Big Data Management Platform (BDMP) which integrates the principles of Master Data Management and Big Data Management into a single data management platform.

Before joining Quaero, Dan spent 13 years with a Marketing Service Provider startup, where he served as Chief Architect and was instrumental in building the company's customer data management and advanced trigger marketing platforms - both of which contributed to substantial growth for the company, leading ultimately to its acquisition. Prior to that, Dan spent 11 years with IBM in various hardware and software design and development positions. While at IBM, Dan received two Outstanding Technical Achievement awards and published two IBM Technical Disclosure Bulletins. Dan earned an Electrical Engineering degree from the Rutgers College of Engineering.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
SYS-CON Events announced today that Blue Box has been named “Bronze Sponsor” of SYS-CON's DevOps Summit New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Blue Box delivers Private Cloud as a Service (PCaaS) to a worldwide customer base. Built on a technology platform leveraging decades of operational expertise in cloud and distributed systems, Blue Box Cloud is a managed private cloud product available in both hosted and on-prem versions. Each Blue Box ...
SYS-CON Events announced today Sematext Group, Inc., a Brooklyn-based Performance Monitoring and Log Management solution provider, will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), search analytics (S...
SYS-CON Events announced today Isomorphic Software, the global leader in high-end, web-based business applications, will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software ...
SYS-CON Events announced today Arista Networks will exhibit at SYS-CON's DevOps Summit 2015 New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Arista Networks was founded to deliver software-driven cloud networking solutions for large data center and computing environments. Arista’s award-winning 10/40/100GbE switches redefine scalability, robustness, and price-performance, with over 3,000 customers and more than three million cloud networking ports depl...
SYS-CON Events announced today that SoftLayer, an IBM company, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015 at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place November 3–5, 2015 at the Santa Clara Convention Center in Santa Clara, CA. SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points...
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborat...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make sm...
SYS-CON Media announced today that Blue Box as launched a popular blog feed on Cloud Computing Journal. Cloud Computing Journal aims to help open the eyes of Enterprise IT professionals to the economics and strategies that utility/cloud computing provides. Blue Box Cloud gives you unequaled agility, without the burden of designing, deploying and managing your own infrastructure. It’s the right choice when public cloud just won’t do. Blue Box Cloud is a managed Private Cloud as a Service (...
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will mee...
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, p...
SYS-CON Events announced today that Stratoscale, the new data center operating system, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Based in Herzeliya, Israel, Stratoscale is redefining the data center, developing a hardware-agnostic, software platform hyper-converging compute, storage and networking across the rack or data center. The self-optimizing platform automatically distributes all physical...
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY., and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ...
SYS-CON Events announced today that Emcien will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Emcien’s vision is to let anyone use data to know the future. Emcien has built an automated, predictive analysis product that improves the lives of real people. Emcien allows people to automate their data analysis so they can build a better future.
SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures...
SYS-CON Events announced today that GENBAND, a leading developer of real time communications software solutions, has been named “Silver Sponsor” of SYS-CON's WebRTC Summit, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. The GENBAND team will be on hand to demonstrate their newest product, Kandy. Kandy is a communications Platform-as-a-Service (PaaS) that enables companies to seamlessly integrate more human communications into their Web and mobile applicatio...
SYS-CON Events announced today that Open Data Centers (ODC), a carrier-neutral colocation provider, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Open Data Centers is a carrier-neutral data center operator in New Jersey and New York City offering alternative connectivity options for carriers, service providers and enterprise customers.
SYS-CON Events announced today that On the Avenue Marketing Group, a sales and marketing firm that utilizes events to market and sell products to consumers, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. On the Avenue Marketing Group (OTA) is a sales and marketing firm that utilizes events to market and sell products to consumers. On behalf of our clients, we attend thousands of fairs, festivals, exp...
SYS-CON Events announced today that BroadSoft, the leading global provider of Unified Communications and Collaboration (UCC) services to operators worldwide, has been named “Gold Sponsor” of SYS-CON's WebRTC Summit, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BroadSoft is the leading provider of software and services that enable mobile, fixed-line and cable service providers to offer Unified Communications over their Internet Protocol networks. The Compa...
SYS-CON Events announced today that ActiveState, the leading independent Cloud Foundry and Docker-based PaaS provider, has been named “Silver Sponsor” of SYS-CON's DevOps Summit New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. ActiveState believes that enterprises gain a competitive advantage when they are able to quickly create, deploy and efficiently manage software solutions that immediately create business value, but they face many challenges that ...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes ...