Monday 29 December 2014

Chapter 6 : Customer Relationship Management

Chapter 6 : Customer Relationship Management

As data is created on the operational side of the CRM equation, we obviously
need to store and analyze the historical metrics resulting from our customer
interaction and transaction systems. Sounds familiar, doesn’t it? The data
warehouse sits at the core of CRM. It serves as the repository to collect and
integrate the breadth of customer information found in our operational systems,
as well as from external sources. The data warehouse is the foundation
that supports the panoramic 360-degree view of our customers, including customer
data from the following typical sources: transactional data, interaction
data (solicitations, call center), demographic and behavioral data (typically
augmented by third parties), and self-provided profile data.
Analytic CRM is enabled via accurate, integrated, and accessible customer
data in the warehouse. We are able to measure the effectiveness of decisions
made in the past in order to optimize future interactions. Customer data can be
leveraged to better identify up-sell and cross-sell opportunities, pinpoint inefficiencies,
generate demand, and improve retention. In addition, we can leverage
the historical, integrated data to generate models or scores that close the
loop back to the operational world. Recalling the major components of a warehouse
environment from Chapter 1, we can envision the model results pushed
back to where the relationship is operationally managed (for example, sales
rep, call center, or Web site), as illustrated in Figure 6.1. The model output can
translate into specific proactive or reactive tactics recommended for the next
point of customer contact, such as the appropriate next product offer or antiattrition
response. The model results also are retained in the data warehouse for

subsequent analysis.

Obviously, as the organization becomes more centered on the customer, so
must the data warehouse. CRM inevitably will drive change in the data warehouse.
Data warehouses will grow even more rapidly as we collect more and
more information about our customers, especially from front-office sources
such as the field force. Our data staging processes will grow more complicated
as we match and integrate data from multiple sources. Most important, the
need for a conformed customer dimension becomes even more paramount.

Packaged CRM
In response to the urgent need of business for CRM, project teams may be
wrestling with a buy versus build decision. In the long run, the build approach
may match the organization’s requirements better than the packaged application,
but the implementation likely will take longer and require more
resources, potentially at a higher cost. Buying a packaged application will
deliver a practically ready-to-go solution, but it may not focus on the integration
and interface issues needed for it to function in the larger IT context. Fortunately,
some providers are supporting common data interchange through
Extensible Markup Language (XML), publishing their data specifications so
that IT can extract dimension and fact data, and supporting customer-specific
conformed dimensions.

Customer Dimension
The conformed customer dimension is a critical element for effective CRM. A
well-maintained, well-deployed conforming customer dimension is the cornerstone
of sound customer-centric analysis.
The customer dimension is typically the most challenging dimension for any
data warehouse. In a large organization, the customer dimension can be
extremely deep (with millions of rows), extremely wide (with dozens or even
hundreds of attributes), and sometimes subject to rather rapid change. One
leading direct marketer maintains over 3,000 attributes about its customers.
Any organization that deals with the general public needs an individual
human being dimension. The biggest retailers, credit card companies, and
government agencies have monster customer dimensions whose sizes exceed
100 million rows. To further complicate matters, the customer dimension often
represents an amalgamation of data from multiple internal and external source
systems.
Dimension Outrigger











Dimension Outriggers for a Low-Cardinality Attribute Set
As we said in Chapter 2, a dimension is said to be snowflaked when the low-cardinality
columns in the dimension have been removed to separate normalized
tables that then link back into the original dimension table. Generally,
snowflaking is not recommended in a data warehouse environment because it
almost always makes the user presentation more complex, in addition to having
a negative impact on browsing performance. Despite this prohibition
against snowflaking, there are some situations where you should build a
dimension outrigger that has the appearance of a snowflaked table. Outriggers
have special characteristics that cause them to be permissible snowflakes.



















In Figure 6.3, the dimension outrigger is a set of data from an external data
provider consisting of 150 demographic and socioeconomic attributes regarding
the customers’ county of residence. The data for all customers residing in a given
county is identical. Rather than repeating this large block of data for every customer
within a county, we opt to model it as an outrigger. There are several factors
that cause us to bend our no-snowflake rule. First of all, the demographic
data is available at a significantly different grain than the primary dimension
data (county versus individual customer). The data is administered and loaded
at different times than the rest of the data in the customer dimension. Also, we
really do save significant space in this case if the underlying customer dimension
is large. If you have a query tool that insists on a classic star schema with no
snowflakes, you can hide the outrigger under a view declaration.
Dimension outriggers are permissible, but they should be the exception rather than
the rule. A red warning flag should go up if your design is riddled with outriggers;
you may have succumbed to the temptation to overly normalize the design.

Large Changing Customer Dimensions
Multimillion-row customer dimensions present two unique challenges that
warrant special treatment. Even if a clean, flat dimension table has been implemented,
it generally takes too long to constrain or browse among the relationships
in such a big table. In addition, it is difficult to use our tried-and-true
techniques from Chapter 4 for tracking changes in these large dimensions. We
probably don’t want to use the type 2 slowly changing dimension technique
and add more rows to a customer dimension that already has millions of rows
in it. Unfortunately, huge customer dimensions are even more likely to change
than moderately sized dimensions. We sometimes call this situation a rapidly
changing monster dimension!

minidimension
Business users often want to track the myriad of customer attribute changes.
In some businesses, tracking change is not merely a nice-to-have analytic capability.
Insurance companies, for example, must update information about their
customers and their specific insured automobiles or homes because it is critical
to have an accurate picture of these dimensions when a policy is approved
or claim is made.
Fortunately, a single technique comes to the rescue to address both the browsing-
performance and change-tracking challenges. The solution is to break off
frequently analyzed or frequently changing attributes into a separate dimension,
referred to as a minidimension. For example, we could create a separate
minidimension for a package of demographic attributes, such as age, gender,
number of children, and income level, presuming that these columns get used
extensively. There would be one row in this minidimension for each unique
combination of age, gender, number of children, and income level encountered
in the data, not one row per customer. These columns are the ones that
are analyzed to select an interesting subset of the customer base. In addition,
users want to track changes to these attributes. We leave behind more constant
or less frequently queried attributes in the original huge customer table.
Sample rows for a demographic minidimension are illustrated in Table 6.3.
When creating the minidimension, continuously variable attributes, such as
income and total purchases, should be converted to banded ranges. In other
words, we force the attributes in the minidimension to take on a relatively small
number of discrete values. Although this restricts use to a set of predefined
bands, it drastically reduces the number of combinations in the minidimension.
If we stored income at a specific dollar and cents value in the minidimension,
when combined with the other demographic attributes, we could end up with as
many rows in the minidimension as in the main customer dimension itself. The
use of band ranges is probably the most significant compromise associated
with the minidimension technique because once we decide on the value bands,
it is quite impractical to change to a different set of bands at a later time. If users
insist on access to a specific raw data value, such as a credit bureau score that is
updated monthly, it also should be included in the fact table, in addition to
being represented as a value band in the demographic minidimension.













Values are available in mididimension as well as in fact
Every time we build a fact table row, we include two foreign keys related to the
customer: the regular customer dimension key and the minidimension demographics
key. As shown in Figure 6.4, the demographics key should be part of
the fact table’s set of foreign keys in order to provide efficient access to the fact
table through the demographics attributes. This design delivers browsing and
constraining performance benefits by providing a smaller point of entry to the
facts. Queries can avoid the huge customer dimension table altogether unless
attributes from that table are constrained.
















When the demographics key participates as a foreign key in the fact table,
another benefit is that the fact table serves to capture the demographic profile
changes. Let’s presume that we are loading data into a periodic snapshot fact
table on a monthly basis. Referring back to our sample demographic minidimension
sample rows in Table 6.3, if one of our customers, John Smith, was 24
years old with an income of $24,000, we’d begin by assigning demographics
key 2 when loading the fact table. If John has a birthday several weeks later,
we’d assign demographics key 18 when the fact table was next loaded. The
demographics key on the earlier fact table rows for John would not be
changed. In this manner, the fact table tracks the age change. We’d continue to
assign demographics key 18 when the fact table is loaded until there’s another
change in John’s demographic profile. If John receives a raise to $26,000 several
months later, a new demographics key would be reflected in the next fact
table load. Again, the earlier rows would be unchanged. Historical demographic
profiles for each customer can be constructed at any time by referring
to the fact table and picking up the simultaneous customer key and its contemporary
demographics key, which in general will be different from the most
recent demographics key.
The minidimension terminology refers to when the demographics key is part of the
fact table composite key; if the demographics key is a foreign key in the customer dimension,
we refer to it as an outrigger.





















The best approach for efficiently browsing and tracking changes of key attributes in
really huge dimensions is to break off one or more minidimensions from the dimension
table, each consisting of small clumps of attributes that have been administered
to have a limited number of values.

Variable-Width Attribute Set
When using external prospect lists, we often are permitted only a one-time use
of the list and don’t have the legal right to store the prospect information internally.
However, if we’ve generated our own prospect information, it certainly
can be stored in the data warehouse. Let’s assume that we’re capturing metrics,
perhaps associated with solicitation or quote-generation events that apply to
both prospects and customers. We could store the prospects and customers
together in a single contact dimension; however, there is a significant disparity
between the numbers of attributes for prospective versus customer contacts. As
illustrated in Figure 6.6, we may know only a handful of identification and location
attributes about our prospects. On the other hand, we may know 50 additional
attributes for a customer, covering purchase, payment, credit and service
behaviors, directly elicited profile attributes, and third-party purchased demographic
attributes. In the world of electronic retailing, we can equate prospects
to be the anonymous Web site visitors as opposed to our registered customers

If we’re dealing with a database platform that supports variable-width rows,
such as Oracle, we may be able to build a single dimension with the full complement
of attributes if the total attributes list is not too long; in some of these
cases, we don’t need to worry about all the prospects’ null columns because
they take up virtually zero disk space. However, if we have a fixed-width database,
or if the attributes list is very long, we are uncomfortable with all the
empty columns for the prospects. In this case, as shown in Figure 6.6, we break
the dimension into a 10-million-row base dimension table consisting of attributes
that are common to both prospects and customers, along with a 1-millionrow
customer minidimension that contains the additional attributes we know
about our customers. Again, we include two foreign keys in the fact table.
Nine of ten fact table rows would join to an empty customer row in the
extended customer minidimension.
behaviour customer group
The secret to building complex behavioral study group queries is to capture the keys
of the customers or products whose behavior you are tracking. You then use the captured
keys to constrain other fact tables without having to rerun the original behavior
analysis.












Variable-Depth Hierarchies
Representing an arbitrary, ragged organization hierarchy is an inherently difficult
task in a relational environment. For example, we may want to report the
revenues for a set of commercial customers who have intricate relationships
with each other, such as in Figure 6.8. Each square on the diagram represents
an individual customer entity connected in an organizational tree. The illustrated
organization has four levels; other customer organizations may have
one, ten, or more levels. Let’s assume that we sell our products or services to
any of these commercial customers. Thus the customer dimension rows can
play the role of parent as well as child. We may want to look at the customers
and their sales revenue individually. At other times, we may want to summarize
revenue to any node in the overall organizational tree.
Instead of using a recursive pointer, we insert a bridge table between the customer
dimension and fact tables, as depicted in Figure 6.9. The bridge table has
been called a helper or associative table in the past, but going forward, we’ll consistently
use the bridge terminology. Use of the bridge table is optional; neither
the customer dimension table nor the fact table has to be modified in any way.
If the bridge table is left out, the customer dimension table joins to the fact
table in the usual way. We can report revenue by customer, but we’re unable to
navigate the organization hierarchy. When the bridge table is inserted between
the customer dimension and fact tables, we’re able to analyze revenue results
at any hierarchical level using standard SQL, albeit via a more complicated
presentation.
The bridge table contains one row for each pathway in Figure 6.8 from a customer
entity to each subsidiary beneath it, as well as a row for the zero-length pathway
from a customer to itself. Each pathway row contains the customer key of the parent
roll-up entity, the customer key of the subsidiary entity, the number of levels
between the parent and the subsidiary, a bottom-most flag that identifies a subsidiary
with no further nodes beneath it, and finally, a top-most flag to indicate
that there are no further nodes above the parent. The sample bridge table rows
corresponding to the hierarchy in Figure 6.8 are shown as in Table 6.4.














When we want to descend the organization hierarchy, we join the tables
together as shown in Figure 6.9. We can now constrain the customer table to a
particular parent customer and request any aggregate measure of all the subsidiaries
at or below that customer. We can use the “# of Levels from Parent”
column in the organization bridge table to control the depth of the analysis.
Constraining to a value of 1 would give all the direct subsidiaries of the customer.
A value greater than zero would give all subsidiary customers but not
the original parent. We can use the “Bottom Flag” column to jump directly to all
the bottom-most customer entities but omit all higher-level customer entities.
If we wanted
to sum up revenue in the fact table for a given customer and all its subsidiaries,
the SQL code would look something like the following:



We can request all the revenue from the organizations associated with many
parents at once, but we have to get the subsidiary keys distinctly or risk double
counting. In the following example we retrieve the January 2002 revenue
from all organizations whose parents are located in San Francisco. The SQL
code is messier, but it works for both unique and multiple parent customers.

































Be very careful when simultaneously joining a single dimension table to two fact
tables of different cardinality. In many cases, relational systems will return the
wrong answer. A similar problem arises when joining two fact tables of different
granularity together directly.
Consider the case in which we have a fact table of customer solicitations and
another fact table with the customer responses resulting both from the solicitations
and other independent sources. There is a one-to-many relationship
between customer and solicitation and another one-to-many relationship
between customer and response. The solicitation and response fact tables have
different cardinalities; in other words, not every solicitation results in a
response (unfortunately for the marketing department), and some responses
are received for which there is no solicitation. Simultaneously joining the solicitation
fact table to the customer dimension, which is in turn joined to the
response fact table, does not return the correct answer in a relational DBMS
because of the cardinality differences. Fortunately, this problem is easily
avoided. We simply issue multipass SQL code to query the solicitation and
response tables in separate queries and then outer join the two answer sets.
The multipass approach has additional benefits in terms of better controlling
performance parameters, in addition to supporting queries that combine data
from fact tables in different physical locations.

No comments:

Post a Comment