|By Dan Smith||
|August 16, 2012 06:00 AM EDT||
Customer engagement has long benefited from data and analytics. Knowing more about each of your customers, their attributes, preferences, behaviors and patterns, is essential to fostering meaningful engagement with them. As technologies advance, and more of people's lives are lived online, more and more data about customers is captured and made available. At face value, this is good; more data means better analytics, which means better understanding of customers and therefore more meaningful engagement. However, volumes of data measured in terabytes, petabytes, and beyond are so big they have spawned the terms "Big Data" and "Big Analytics." At this scale, there are practical considerations that must be understood to successfully reap the benefits for customer engagement. This article will explore some of these considerations and provide some suggestions on how to address them.
Customer Data Management (CDM), also known as Customer Data Integration (CDI), is foundational for a Customer Intelligence (CI) or Customer Engagement (CE) system. CDM is rooted in the principles of Master Data Management (MDM), which includes the following:
- Acquisition and ingestion of multiple, disparate sources, both online and offline, of customer and prospect data
- Change Data Capture (CDC)
- Data cleansing, parsing, and standardization
- Entity Modeling
- Entity relationship and hierarchy management
- Entity matching, identity resolution, and persistent key management for key individual, household, company/institution/location entities
- Rules-based attribute mastering, "Survivorship" or "Build the Best Record"
- Data lineage, version history, audit, aging, and expiration
It's useful to first make the distinction between attributive and behavioral data. Attributive data, often referred to as profile data, is discrete fields that describe an entity such as an individual's name, address, age, eye color, and income. Behavioral data is a series of events that describe an entity's behavior over time, such as phone calls, web page visits, and financial transactions. Admittedly, there is a slippery slope between the two; a customer's current account balance can be either an attribute or an aggregation of behavioral transactions.
MDM typically focuses on attributive data. Being based on MDM, the same is true for CDM. Personally Identifying Information (PII) such as name, email, address, phone, and username are the primary drivers behind identity resolution. Other attributes such as income, number of children, or gender are attributes that are commonly "mastered" for each of the resolved entities (individual, household, company).
Enter Big Data. As more devices are developed - and adopted - that capture and store data, huge quantities of data are generated. Big Data, by definition, is almost always event-oriented and temporal, and the subset of Big Data that is relevant to a CE system is almost always behavioral in nature (clicks, calls, downloads, purchases, emails, texts, tweets, Facebook posts). Behavioral data is critical to understanding customers (and prospects). And, understanding customers is critical for establishing meaningful and welcome engagement with them. Therefore, Big Data is, or should be, viewed as an invaluable asset to any CE system.
Further, this sort of rich, temporal behavioral data is ripe for analytics. In fact, the term Big Analytics has emerged as a result. Big Analytics can be defined as the ability to execute analytics on Big Data. However, there are some real challenges involved in executing analytics on Big Data, challenges that drive the need for specialized technologies such as Hadoop or Netezza (or both). These technologies must support Massively Parallel Processing (MPP) and, just as importantly if not more so, they must bring the analytics to the data instead of bringing the data to the analytics. Having recently completed a course for Hadoop developers (an excellent course that I highly recommend), I have a heightened appreciation for the challenges related to managing and analyzing data "at scale" and the need for specialized technologies that support Big Data and Big Analytics.
A few significant points regarding Big Analytics should be considered:
- Big Analytics allow the build of models on an entire data set, rather than just a sampling or an aggregation. My colleague, Jack McCush, explains: "When building models on a small subset and then validating them against a larger set to make sure the assumptions hold, you can miss the ability to predict rare events. And often those rare events are the ones that drive profit."
- Big Analytics allow the build of non-traditional models, for example, social graphs and influencer analytics. Several useful and inherently big sources of data such as Call Detail Records (CDRs) generated from mobile/smart phones and web clickstream data both lend themselves well to these models.
- Big Analytics can take even traditional analytics to the next level. Big Analytics allows the execution of traditional correlation and clustering models in a fraction of the time, even with billions of records and hundreds of variables. As Revolution Analytics points out in Advanced 'Big Data' Analytics with R and Hadoop, "Research suggests that a simple algorithm with a large volume of data is more accurate than a sophisticated algorithm with little data. The algorithm is not the competitive advantage; the ability to apply it to huge amounts of data-without compromising performance-generates the competitive advantage."
Big Data is great for a CE system. It paints a rich behavioral picture of customers and prospects and takes CE-enabling analytics to the next level. But what happens when this massive behavioral data is thrown at a CDM/MDM system that is optimized for attributive data? A "basketball through the garden hose" effect might occur. But this doesn't have to happen; there are ways to gracefully extend CDM to manage Big Data.
The key is data classification. Attributive, or profile, data is classified separately from behavioral data. While both contain Source Native Key (e.g., cookie-based visitor id, cell phone number, device id, account number), attributive data can be structured only. Behavioral data, on the other hand, can be structured and unstructured and contains no PII. Big Data almost always falls under the behavioral category.
Importantly, behavioral data requires different processing than attributive data. Since the processing is different, the two streams can be separated just after ingestion, like a fork in the road, with the attributive data going one way and the behavioral data going the other. This is the key to integrating Big Data into a CDM-MDM system without grinding it to a halt. To be fair, the two streams aren't completely independent. The behavioral stream will typically require two things from the attributive stream: Dimension Tables and Master ID-to-Natural Key Cross-References - both of which can be considered as reference data.
For example, the "subscriber" dimension table may be required in the Big Data world so that it can be joined to the "web clicks" table. This is done in order to aggregate web clicks by subscriber gender, which only exists in the subscriber table.
Master ID-to-Natural Key Cross-References
Master IDs are created and managed in the CDM-MDM world, but they are often needed for linkage and aggregation in the Big Data world. Shadowing cross-references that map master IDs, such as master individual id, to "source natural keys" into the Big Data world solves this problem.
The two classifications of data are separated into two streams and processed (mostly) independently. How do they come back together? One way this architecture works is that both streams, attributive and behavioral, contain a "source natural key." This is a unique identifier that relates the two streams. For example, web clickstream data typically has an IP address or a web application-managed, cookie-based visitor ID. Transactional data typically has an account number. Mobile data will have a phone number or device ID. These identifiers don't have to mean anything, per se, but are critical for stitching the two streams back together.
It's not just the dimensionalized, aggregated data that is reunited with the profile data, but also the high-value, behavioral analytics attributes (predictive scores, micro-segmentations, etc.) created courtesy of Big Analytics. The attributive data is now greatly enriched by the output of the Big Data processing stream. And, to get things really crazy, these enriched behavioral analytics profile attributes can be used as part of the next cycle of matching; similar, complex behavior patterns can help tip the scales, causing two entities to match that might not have matched otherwise. In the end, CDM-MDM and Big Data can live together harmoniously; Big Data doesn't replace CDM-MDM, but rather extends it.
Software development, like manufacturing, is a craft that requires the application of creative approaches to solve problems given a wide range of constraints. However, while engineering design may be craftwork, the production of most designed objects relies on a standardized and automated manufacturing process. By contrast, much of moving an application from prototype to production and, indeed, maintaining the application through its lifecycle has often remained craftwork. In his session at Dev...
May. 29, 2015 12:00 PM EDT Reads: 1,476
Amazon, Google and Facebook are household names in part because of their mastery of Big Data. But what about organizations without billions of dollars to spend on Big Data tools - how can they extract value from their data? In his session at 6th Big Data Expo®, Ali Ghodsi, Co-Founder and Head of Engineering at Databricks, discussed how the zero management cost and scalability of the cloud is addressing the challenges and pain points that data engineers face when working with Big Data. He also s...
May. 29, 2015 12:00 PM EDT Reads: 4,287
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
May. 29, 2015 12:00 PM EDT Reads: 1,736
An entirely new security model is needed for the Internet of Things, or is it? Can we save some old and tested controls for this new and different environment? In his session at @ThingsExpo, New York's at the Javits Center, Davi Ottenheimer, EMC Senior Director of Trust, reviewed hands-on lessons with IoT devices and reveal a new risk balance you might not expect. Davi Ottenheimer, EMC Senior Director of Trust, has more than nineteen years' experience managing global security operations and asse...
May. 29, 2015 11:00 AM EDT Reads: 5,698
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session a...
May. 29, 2015 11:00 AM EDT Reads: 4,009
SYS-CON Events announced today that EnterpriseDB (EDB), the leading worldwide provider of enterprise-class Postgres products and database compatibility solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. EDB is the largest provider of Postgres software and services that provides enterprise-class performance and scalability and the open source freedom to divert budget from more costly traditiona...
May. 29, 2015 11:00 AM EDT Reads: 1,588
The multi-trillion economic opportunity around the "Internet of Things" (IoT) is emerging as the hottest topic for investors in 2015. As we connect the physical world with information technology, data from actions, processes and the environment can increase sales, improve efficiencies, automate daily activities and minimize risk. In his session at @ThingsExpo, Ed Maguire, Senior Analyst at CLSA Americas, will describe what is new and different about IoT, explore financial, technological and re...
May. 29, 2015 10:50 AM EDT Reads: 273
Andi Mann has been serving as Conference Chair of the DevOps Summit since its inception. He is one of the world's recognized leaders in DevOps, and continues to be one of its most articulate advocates. Here are some recent thoughts of his in an interview we conducted in the run-up to the DevOps Summit to be held June 9-11 at the Javits Center in New York City. When did you first start thinking about DevOps and its potential impact on enterprise IT? Andi: I first started thinking about DevOps b...
May. 29, 2015 10:33 AM EDT Reads: 329
The OpenStack cloud operating system includes Trove, a database abstraction layer. Rather than applications connecting directly to a specific type of database, they connect to Trove, which in turn connects to one or more specific databases. One target database is Postgres Plus Cloud Database, which includes its own RESTful API. Trove was originally developed around MySQL, whose interfaces are significantly less complicated than those of the Postgres cloud database. In his session at 16th Cloud...
May. 29, 2015 10:00 AM EDT Reads: 1,217
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile ...
May. 29, 2015 10:00 AM EDT Reads: 5,453
We are all here because we are sold on the transformative promise of The Cloud. But what good is all of this ephemeral, on-demand infrastructure if your usage doesn't actually improve the agility and speed of your business? How must Operations adapt in order to avoid stifling your Cloud initiative? In his session at DevOps Summit, Damon Edwards, co-founder and managing partner of the DTO Solutions, will highlight the successful organizational, process, and tooling patterns of high-performing c...
May. 29, 2015 10:00 AM EDT Reads: 5,377
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the...
May. 29, 2015 10:00 AM EDT Reads: 4,091
The web app is Agile. The REST API is Agile. The testing and planning are Agile. But alas, Data infrastructures certainly are not. Once an application matures, changing the shape or indexing scheme of data often forces at best a top down planning exercise and at worst includes schema changes which force downtime. The time has come for a new approach that fundamentally advances the agility of distributed data infrastructures. Come learn about a new solution to the problems faced by software orga...
May. 29, 2015 10:00 AM EDT Reads: 622
There will be 150 billion connected devices by 2020. New digital businesses have already disrupted value chains across every industry. APIs are at the center of the digital business. You need to understand what assets you have that can be exposed digitally, what their digital value chain is, and how to create an effective business model around that value chain to compete in this economy. No enterprise can be complacent and not engage in the digital economy. Learn how to be the disruptor and not ...
May. 29, 2015 09:45 AM EDT Reads: 650
In their general session at 16th Cloud Expo, Michael Piccininni, Global Account Manager – Cloud SP at EMC Corporation, and Mike Dietze, Regional Director at Windstream Hosted Solutions, will review next generation cloud services, including the Windstream-EMC Tier Storage solutions, and discuss how to increase efficiencies, improve service delivery and enhance corporate cloud solution development. Speaker Bios Michael Piccininni is Global Account Manager – Cloud SP at EMC Corporation. He has b...
May. 29, 2015 09:45 AM EDT Reads: 1,218
In a recent research, analyst firm IDC found that the average cost of a critical application failure is $500,000 to $1 million per hour and the average total cost of unplanned application downtime is $1.25 billion to $2.5 billion per year for Fortune 1000 companies. In addition to the findings on the cost of the downtime, the research also highlighted best practices for development, testing, application support, infrastructure, and operations teams.
May. 29, 2015 09:15 AM EDT Reads: 1,270
There is no question that the cloud is where businesses want to host data. Until recently hypervisor virtualization was the most widely used method in cloud computing. Recently virtual containers have been gaining in popularity, and for good reason. In the debate between virtual machines and containers, the latter have been seen as the new kid on the block – and like other emerging technology have had some initial shortcomings. However, the container space has evolved drastically since coming on...
May. 29, 2015 09:15 AM EDT Reads: 1,371
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
May. 29, 2015 09:00 AM EDT Reads: 1,450
Enterprises are fast realizing the importance of integrating SaaS/Cloud applications, API and on-premises data and processes, to unleash hidden value. This webinar explores how managers can use a Microservice-centric approach to aggressively tackle the unexpected new integration challenges posed by proliferation of cloud, mobile, social and big data projects. Industry analyst and SOA expert Jason Bloomberg will strip away the hype from microservices, and clearly identify their advantages and d...
May. 29, 2015 09:00 AM EDT Reads: 1,532
The Domain Name Service (DNS) is one of the most important components in networking infrastructure, enabling users and services to access applications by translating URLs (names) into IP addresses (numbers). Because every icon and URL and all embedded content on a website requires a DNS lookup loading complex sites necessitates hundreds of DNS queries. In addition, as more internet-enabled ‘Things' get connected, people will rely on DNS to name and find their fridges, toasters and toilets. Acco...
May. 29, 2015 09:00 AM EDT Reads: 5,205