|By Dan Smith||
|August 16, 2012 06:00 AM EDT||
Customer engagement has long benefited from data and analytics. Knowing more about each of your customers, their attributes, preferences, behaviors and patterns, is essential to fostering meaningful engagement with them. As technologies advance, and more of people's lives are lived online, more and more data about customers is captured and made available. At face value, this is good; more data means better analytics, which means better understanding of customers and therefore more meaningful engagement. However, volumes of data measured in terabytes, petabytes, and beyond are so big they have spawned the terms "Big Data" and "Big Analytics." At this scale, there are practical considerations that must be understood to successfully reap the benefits for customer engagement. This article will explore some of these considerations and provide some suggestions on how to address them.
Customer Data Management (CDM), also known as Customer Data Integration (CDI), is foundational for a Customer Intelligence (CI) or Customer Engagement (CE) system. CDM is rooted in the principles of Master Data Management (MDM), which includes the following:
- Acquisition and ingestion of multiple, disparate sources, both online and offline, of customer and prospect data
- Change Data Capture (CDC)
- Data cleansing, parsing, and standardization
- Entity Modeling
- Entity relationship and hierarchy management
- Entity matching, identity resolution, and persistent key management for key individual, household, company/institution/location entities
- Rules-based attribute mastering, "Survivorship" or "Build the Best Record"
- Data lineage, version history, audit, aging, and expiration
It's useful to first make the distinction between attributive and behavioral data. Attributive data, often referred to as profile data, is discrete fields that describe an entity such as an individual's name, address, age, eye color, and income. Behavioral data is a series of events that describe an entity's behavior over time, such as phone calls, web page visits, and financial transactions. Admittedly, there is a slippery slope between the two; a customer's current account balance can be either an attribute or an aggregation of behavioral transactions.
MDM typically focuses on attributive data. Being based on MDM, the same is true for CDM. Personally Identifying Information (PII) such as name, email, address, phone, and username are the primary drivers behind identity resolution. Other attributes such as income, number of children, or gender are attributes that are commonly "mastered" for each of the resolved entities (individual, household, company).
Enter Big Data. As more devices are developed - and adopted - that capture and store data, huge quantities of data are generated. Big Data, by definition, is almost always event-oriented and temporal, and the subset of Big Data that is relevant to a CE system is almost always behavioral in nature (clicks, calls, downloads, purchases, emails, texts, tweets, Facebook posts). Behavioral data is critical to understanding customers (and prospects). And, understanding customers is critical for establishing meaningful and welcome engagement with them. Therefore, Big Data is, or should be, viewed as an invaluable asset to any CE system.
Further, this sort of rich, temporal behavioral data is ripe for analytics. In fact, the term Big Analytics has emerged as a result. Big Analytics can be defined as the ability to execute analytics on Big Data. However, there are some real challenges involved in executing analytics on Big Data, challenges that drive the need for specialized technologies such as Hadoop or Netezza (or both). These technologies must support Massively Parallel Processing (MPP) and, just as importantly if not more so, they must bring the analytics to the data instead of bringing the data to the analytics. Having recently completed a course for Hadoop developers (an excellent course that I highly recommend), I have a heightened appreciation for the challenges related to managing and analyzing data "at scale" and the need for specialized technologies that support Big Data and Big Analytics.
A few significant points regarding Big Analytics should be considered:
- Big Analytics allow the build of models on an entire data set, rather than just a sampling or an aggregation. My colleague, Jack McCush, explains: "When building models on a small subset and then validating them against a larger set to make sure the assumptions hold, you can miss the ability to predict rare events. And often those rare events are the ones that drive profit."
- Big Analytics allow the build of non-traditional models, for example, social graphs and influencer analytics. Several useful and inherently big sources of data such as Call Detail Records (CDRs) generated from mobile/smart phones and web clickstream data both lend themselves well to these models.
- Big Analytics can take even traditional analytics to the next level. Big Analytics allows the execution of traditional correlation and clustering models in a fraction of the time, even with billions of records and hundreds of variables. As Revolution Analytics points out in Advanced 'Big Data' Analytics with R and Hadoop, "Research suggests that a simple algorithm with a large volume of data is more accurate than a sophisticated algorithm with little data. The algorithm is not the competitive advantage; the ability to apply it to huge amounts of data-without compromising performance-generates the competitive advantage."
Big Data is great for a CE system. It paints a rich behavioral picture of customers and prospects and takes CE-enabling analytics to the next level. But what happens when this massive behavioral data is thrown at a CDM/MDM system that is optimized for attributive data? A "basketball through the garden hose" effect might occur. But this doesn't have to happen; there are ways to gracefully extend CDM to manage Big Data.
The key is data classification. Attributive, or profile, data is classified separately from behavioral data. While both contain Source Native Key (e.g., cookie-based visitor id, cell phone number, device id, account number), attributive data can be structured only. Behavioral data, on the other hand, can be structured and unstructured and contains no PII. Big Data almost always falls under the behavioral category.
Importantly, behavioral data requires different processing than attributive data. Since the processing is different, the two streams can be separated just after ingestion, like a fork in the road, with the attributive data going one way and the behavioral data going the other. This is the key to integrating Big Data into a CDM-MDM system without grinding it to a halt. To be fair, the two streams aren't completely independent. The behavioral stream will typically require two things from the attributive stream: Dimension Tables and Master ID-to-Natural Key Cross-References - both of which can be considered as reference data.
For example, the "subscriber" dimension table may be required in the Big Data world so that it can be joined to the "web clicks" table. This is done in order to aggregate web clicks by subscriber gender, which only exists in the subscriber table.
Master ID-to-Natural Key Cross-References
Master IDs are created and managed in the CDM-MDM world, but they are often needed for linkage and aggregation in the Big Data world. Shadowing cross-references that map master IDs, such as master individual id, to "source natural keys" into the Big Data world solves this problem.
The two classifications of data are separated into two streams and processed (mostly) independently. How do they come back together? One way this architecture works is that both streams, attributive and behavioral, contain a "source natural key." This is a unique identifier that relates the two streams. For example, web clickstream data typically has an IP address or a web application-managed, cookie-based visitor ID. Transactional data typically has an account number. Mobile data will have a phone number or device ID. These identifiers don't have to mean anything, per se, but are critical for stitching the two streams back together.
It's not just the dimensionalized, aggregated data that is reunited with the profile data, but also the high-value, behavioral analytics attributes (predictive scores, micro-segmentations, etc.) created courtesy of Big Analytics. The attributive data is now greatly enriched by the output of the Big Data processing stream. And, to get things really crazy, these enriched behavioral analytics profile attributes can be used as part of the next cycle of matching; similar, complex behavior patterns can help tip the scales, causing two entities to match that might not have matched otherwise. In the end, CDM-MDM and Big Data can live together harmoniously; Big Data doesn't replace CDM-MDM, but rather extends it.
"At our booth we are showing how to provide trust in the Internet of Things. Trust is where everything starts to become secure and trustworthy. Now with the scaling of the Internet of Things it becomes an interesting question – I've heard numbers from 200 billion devices next year up to a trillion in the next 10 to 15 years," explained Johannes Lintzen, Vice President of Sales at Utimaco, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in San...
Dec. 17, 2014 07:30 PM EST Reads: 1,236
“DevOps is really about the business. The business is under pressure today, competitively in the marketplace to respond to the expectations of the customer. The business is driving IT and the problem is that IT isn't responding fast enough," explained Mark Levy, Senior Product Marketing Manager at Serena Software, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 07:30 PM EST Reads: 408
“The year of the cloud – we have no idea when it's really happening but we think it's happening now. For those technology providers like Zentera that are helping enterprises move to the cloud - it's been fun to watch," noted Mike Loftus, VP Product Management and Marketing at Zentera Systems, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 07:30 PM EST Reads: 472
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session a...
Dec. 17, 2014 07:15 PM EST Reads: 1,151
"Desktop as a Service is emerging as a very big trend. One of the big influencers of this – for Esri – is that we have a large user base that uses virtualization and they are looking at Desktop as a Service right now," explained John Meza, Product Engineer at Esri, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 07:00 PM EST Reads: 1,381
Mobile commerce traffic is surpassing desktop, yet less than 20% of sales in the U.S. are mobile commerce sales. In his session at 15th Cloud Expo, Dan Franklin, Segment Manager, Commerce, at Verizon Digital Media Services, defined mobile devices and discussed how next generation means simplification. It means taking your digital content and turning it into instantly gratifying experiences.
Dec. 17, 2014 06:45 PM EST Reads: 577
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the ...
Dec. 17, 2014 06:30 PM EST Reads: 1,084
As enterprises engage with Big Data technologies to develop applications needed to meet operational demands, new computation fabrics are continually being introduced. To leverage these new innovations, organizations are sacrificing market opportunities to gain expertise in learning new systems. In his session at Big Data Expo, Supreet Oberoi, Vice President of Field Engineering at Concurrent, Inc., discussed how to leverage existing infrastructure and investments and future-proof them against e...
Dec. 17, 2014 06:00 PM EST Reads: 1,248
"For over 25 years we have been working with a lot of enterprise customers and we have seen how companies create applications. And now that we have moved to cloud computing, mobile, social and the Internet of Things, we see that the market needs a new way of creating applications," stated Jesse Shiah, CEO, President and Co-Founder of AgilePoint Inc., in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 06:00 PM EST Reads: 1,219
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 06:00 PM EST Reads: 1,060
"Ulunsoft is a start-up that focuses on how enterprises build cloud-based IT infrastructure for business," explained Haibo Zhu, President of Ulunsoft Corp, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 05:00 PM EST Reads: 972
"Application monitoring and intelligence can smooth the path in a DevOps environment. In a DevOps environment you see constant change. If you are trying to monitor things in a constantly changing environment, you're going to spend a lot of your job fixing your monitoring," explained Todd Rader, Solutions Architect at AppDynamics, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 04:00 PM EST Reads: 1,418
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile ...
Dec. 17, 2014 11:45 AM EST Reads: 1,364
"SOASTA built the concept of cloud testing in 2008. It's grown from rather meager beginnings to where now we are provisioning hundreds of thousands of servers on a daily basis on behalf of customers around the world to test their applications," explained Tom Lounibos, CEO of SOASTA, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 11:30 AM EST Reads: 1,144
“We are a managed services company. We have taken the key aspects of the cloud and the purposed data center and merged the two together and launched the Purposed Cloud about 18–24 months ago," explained Chetan Patwardhan, CEO of Stratogent, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 17, 2014 11:00 AM EST Reads: 1,055
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core...
Dec. 16, 2014 11:45 PM EST Reads: 1,189
"Verizon Digital Media Services is responsible for the broadcast, video and content delivery network that accelerates, scales and helps our customers reach end users with all kinds of video and web content," stated James Segil, CMO of Verizon Digital Media Services, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 16, 2014 10:00 PM EST Reads: 1,156
SYS-CON Events announced today that Creative Business Solutions will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Creative Business Solutions is the top stocking authorized HP Renew Distributor in the U.S. Based out of Long Island, NY, Creative Business Solutions offers a one-stop shop for a diverse range of products including Proliant, Blade and Industry Standard Servers, Networking, Server Options and...
Dec. 16, 2014 09:30 PM EST Reads: 999
You use an agile process; your goal is to make your organization more agile. But what about your data infrastructure? The truth is, today's databases are anything but agile - they are effectively static repositories that are cumbersome to work with, difficult to change, and cannot keep pace with application demands. Performance suffers as a result, and it takes far longer than it should to deliver new features and capabilities needed to make your organization competitive. As your application an...
Dec. 16, 2014 08:00 PM EST Reads: 1,188
“We are strong believers in the DevOps movement and our staff has been doing DevOps for large enterprise environments for a number of years. The solution that we build is intended to allow DevOps teams to do security at the speed of DevOps," explained Justin Lundy, Founder & CTO of Evident.io, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 16, 2014 06:00 PM EST Reads: 921