Free Gyaan: Chapter 6 : Customer Relationship Management

Chapter 6 : Customer Relationship Management

As data is created on the operational side of the CRM equation, we obviously
need to store and analyze the historical metrics resulting from our customer
interaction and transaction systems. Sounds familiar, doesn’t it? The data
warehouse sits at the core of CRM. It serves as the repository to collect and
integrate the breadth of customer information found in our operational systems,
as well as from external sources. The data warehouse is the foundation
that supports the panoramic 360-degree view of our customers, including customer
data from the following typical sources: transactional data, interaction
data (solicitations, call center), demographic and behavioral data (typically
augmented by third parties), and self-provided profile data.
Analytic CRM is enabled via accurate, integrated, and accessible customer
data in the warehouse. We are able to measure the effectiveness of decisions
made in the past in order to optimize future interactions. Customer data can be
leveraged to better identify up-sell and cross-sell opportunities, pinpoint inefficiencies,
generate demand, and improve retention. In addition, we can leverage
the historical, integrated data to generate models or scores that close the
loop back to the operational world. Recalling the major components of a warehouse
environment from Chapter 1, we can envision the model results pushed
back to where the relationship is operationally managed (for example, sales
rep, call center, or Web site), as illustrated in Figure 6.1. The model output can
translate into specific proactive or reactive tactics recommended for the next
point of customer contact, such as the appropriate next product offer or antiattrition
response. The model results also are retained in the data warehouse for

subsequent analysis.

Obviously, as the organization becomes more centered on the customer, so
must the data warehouse. CRM inevitably will drive change in the data warehouse.
Data warehouses will grow even more rapidly as we collect more and
more information about our customers, especially from front-office sources
such as the field force. Our data staging processes will grow more complicated
as we match and integrate data from multiple sources. Most important, the
need for a conformed customer dimension becomes even more paramount.

Packaged CRM
In response to the urgent need of business for CRM, project teams may be
wrestling with a buy versus build decision. In the long run, the build approach
may match the organization’s requirements better than the packaged application,
but the implementation likely will take longer and require more
resources, potentially at a higher cost. Buying a packaged application will
deliver a practically ready-to-go solution, but it may not focus on the integration
and interface issues needed for it to function in the larger IT context. Fortunately,
some providers are supporting common data interchange through
Extensible Markup Language (XML), publishing their data specifications so
that IT can extract dimension and fact data, and supporting customer-specific
conformed dimensions.

Customer Dimension

The conformed customer dimension is a critical element for effective CRM. A

well-maintained, well-deployed conforming customer dimension is the cornerstone

of sound customer-centric analysis.

The customer dimension is typically the most challenging dimension for any

data warehouse. In a large organization, the customer dimension can be

extremely deep (with millions of rows), extremely wide (with dozens or even

hundreds of attributes), and sometimes subject to rather rapid change. One

leading direct marketer maintains over 3,000 attributes about its customers.

Any organization that deals with the general public needs an individual

human being dimension. The biggest retailers, credit card companies, and

government agencies have monster customer dimensions whose sizes exceed

100 million rows. To further complicate matters, the customer dimension often

represents an amalgamation of data from multiple internal and external source

systems.

Dimension Outrigger

Dimension Outriggers for a Low-Cardinality Attribute Set

As we said in Chapter 2, a dimension is said to be snowflaked when the low-cardinality

columns in the dimension have been removed to separate normalized

tables that then link back into the original dimension table. Generally,

snowflaking is not recommended in a data warehouse environment because it

almost always makes the user presentation more complex, in addition to having

a negative impact on browsing performance. Despite this prohibition

against snowflaking, there are some situations where you should build a

dimension outrigger that has the appearance of a snowflaked table. Outriggers

have special characteristics that cause them to be permissible snowflakes.

In Figure 6.3, the dimension outrigger is a set of data from an external data

provider consisting of 150 demographic and socioeconomic attributes regarding

the customers’ county of residence. The data for all customers residing in a given

county is identical. Rather than repeating this large block of data for every customer

within a county, we opt to model it as an outrigger. There are several factors

that cause us to bend our no-snowflake rule. First of all, the demographic

data is available at a significantly different grain than the primary dimension

data (county versus individual customer). The data is administered and loaded

at different times than the rest of the data in the customer dimension. Also, we

really do save significant space in this case if the underlying customer dimension

is large. If you have a query tool that insists on a classic star schema with no

snowflakes, you can hide the outrigger under a view declaration.

Dimension outriggers are permissible, but they should be the exception rather than

the rule. A red warning flag should go up if your design is riddled with outriggers;

you may have succumbed to the temptation to overly normalize the design.

Large Changing Customer Dimensions

Multimillion-row customer dimensions present two unique challenges that

warrant special treatment. Even if a clean, flat dimension table has been implemented,

it generally takes too long to constrain or browse among the relationships

in such a big table. In addition, it is difficult to use our tried-and-true

techniques from Chapter 4 for tracking changes in these large dimensions. We

probably don’t want to use the type 2 slowly changing dimension technique

and add more rows to a customer dimension that already has millions of rows

in it. Unfortunately, huge customer dimensions are even more likely to change

than moderately sized dimensions. We sometimes call this situation a rapidly

changing monster dimension!

minidimension

Business users often want to track the myriad of customer attribute changes.

In some businesses, tracking change is not merely a nice-to-have analytic capability.

Insurance companies, for example, must update information about their

customers and their specific insured automobiles or homes because it is critical

to have an accurate picture of these dimensions when a policy is approved

or claim is made.

Fortunately, a single technique comes to the rescue to address both the browsing-

performance and change-tracking challenges. The solution is to break off

frequently analyzed or frequently changing attributes into a separate dimension,

referred to as a minidimension. For example, we could create a separate

minidimension for a package of demographic attributes, such as age, gender,

number of children, and income level, presuming that these columns get used

extensively. There would be one row in this minidimension for each unique

combination of age, gender, number of children, and income level encountered

in the data, not one row per customer. These columns are the ones that

are analyzed to select an interesting subset of the customer base. In addition,

users want to track changes to these attributes. We leave behind more constant

or less frequently queried attributes in the original huge customer table.

Sample rows for a demographic minidimension are illustrated in Table 6.3.

When creating the minidimension, continuously variable attributes, such as

income and total purchases, should be converted to banded ranges. In other

words, we force the attributes in the minidimension to take on a relatively small

number of discrete values. Although this restricts use to a set of predefined

bands, it drastically reduces the number of combinations in the minidimension.

If we stored income at a specific dollar and cents value in the minidimension,

when combined with the other demographic attributes, we could end up with as

many rows in the minidimension as in the main customer dimension itself. The

use of band ranges is probably the most significant compromise associated

with the minidimension technique because once we decide on the value bands,

it is quite impractical to change to a different set of bands at a later time. If users

insist on access to a specific raw data value, such as a credit bureau score that is

updated monthly, it also should be included in the fact table, in addition to

being represented as a value band in the demographic minidimension.

Values are available in mididimension as well as in fact

Every time we build a fact table row, we include two foreign keys related to the

customer: the regular customer dimension key and the minidimension demographics

key. As shown in Figure 6.4, the demographics key should be part of

the fact table’s set of foreign keys in order to provide efficient access to the fact

table through the demographics attributes. This design delivers browsing and

constraining performance benefits by providing a smaller point of entry to the

facts. Queries can avoid the huge customer dimension table altogether unless

attributes from that table are constrained.

When the demographics key participates as a foreign key in the fact table,

another benefit is that the fact table serves to capture the demographic profile

changes. Let’s presume that we are loading data into a periodic snapshot fact

table on a monthly basis. Referring back to our sample demographic minidimension

sample rows in Table 6.3, if one of our customers, John Smith, was 24

years old with an income of $24,000, we’d begin by assigning demographics

key 2 when loading the fact table. If John has a birthday several weeks later,

we’d assign demographics key 18 when the fact table was next loaded. The

demographics key on the earlier fact table rows for John would not be

changed. In this manner, the fact table tracks the age change. We’d continue to

assign demographics key 18 when the fact table is loaded until there’s another

change in John’s demographic profile. If John receives a raise to $26,000 several

months later, a new demographics key would be reflected in the next fact

table load. Again, the earlier rows would be unchanged. Historical demographic

profiles for each customer can be constructed at any time by referring

to the fact table and picking up the simultaneous customer key and its contemporary

demographics key, which in general will be different from the most

recent demographics key.

The minidimension terminology refers to when the demographics key is part of the

fact table composite key; if the demographics key is a foreign key in the customer dimension,

we refer to it as an outrigger.

The best approach for efficiently browsing and tracking changes of key attributes in

really huge dimensions is to break off one or more minidimensions from the dimension

table, each consisting of small clumps of attributes that have been administered

to have a limited number of values.

Variable-Width Attribute Set

When using external prospect lists, we often are permitted only a one-time use

of the list and don’t have the legal right to store the prospect information internally.

However, if we’ve generated our own prospect information, it certainly

can be stored in the data warehouse. Let’s assume that we’re capturing metrics,

perhaps associated with solicitation or quote-generation events that apply to

both prospects and customers. We could store the prospects and customers

together in a single contact dimension; however, there is a significant disparity

between the numbers of attributes for prospective versus customer contacts. As

illustrated in Figure 6.6, we may know only a handful of identification and location

attributes about our prospects. On the other hand, we may know 50 additional

attributes for a customer, covering purchase, payment, credit and service

behaviors, directly elicited profile attributes, and third-party purchased demographic

attributes. In the world of electronic retailing, we can equate prospects

to be the anonymous Web site visitors as opposed to our registered customers

If we’re dealing with a database platform that supports variable-width rows,

such as Oracle, we may be able to build a single dimension with the full complement

of attributes if the total attributes list is not too long; in some of these

cases, we don’t need to worry about all the prospects’ null columns because

they take up virtually zero disk space. However, if we have a fixed-width database,

or if the attributes list is very long, we are uncomfortable with all the

empty columns for the prospects. In this case, as shown in Figure 6.6, we break

the dimension into a 10-million-row base dimension table consisting of attributes

that are common to both prospects and customers, along with a 1-millionrow

customer minidimension that contains the additional attributes we know

about our customers. Again, we include two foreign keys in the fact table.

Nine of ten fact table rows would join to an empty customer row in the

extended customer minidimension.

behaviour customer group

The secret to building complex behavioral study group queries is to capture the keys

of the customers or products whose behavior you are tracking. You then use the captured

keys to constrain other fact tables without having to rerun the original behavior

analysis.

Variable-Depth Hierarchies

Representing an arbitrary, ragged organization hierarchy is an inherently difficult

task in a relational environment. For example, we may want to report the

revenues for a set of commercial customers who have intricate relationships

with each other, such as in Figure 6.8. Each square on the diagram represents

an individual customer entity connected in an organizational tree. The illustrated

organization has four levels; other customer organizations may have

one, ten, or more levels. Let’s assume that we sell our products or services to

any of these commercial customers. Thus the customer dimension rows can

play the role of parent as well as child. We may want to look at the customers

and their sales revenue individually. At other times, we may want to summarize

revenue to any node in the overall organizational tree.

Instead of using a recursive pointer, we insert a bridge table between the customer

dimension and fact tables, as depicted in Figure 6.9. The bridge table has

been called a helper or associative table in the past, but going forward, we’ll consistently

use the bridge terminology. Use of the bridge table is optional; neither

the customer dimension table nor the fact table has to be modified in any way.

If the bridge table is left out, the customer dimension table joins to the fact

table in the usual way. We can report revenue by customer, but we’re unable to

navigate the organization hierarchy. When the bridge table is inserted between

the customer dimension and fact tables, we’re able to analyze revenue results

at any hierarchical level using standard SQL, albeit via a more complicated

presentation.

The bridge table contains one row for each pathway in Figure 6.8 from a customer

entity to each subsidiary beneath it, as well as a row for the zero-length pathway

from a customer to itself. Each pathway row contains the customer key of the parent

roll-up entity, the customer key of the subsidiary entity, the number of levels

between the parent and the subsidiary, a bottom-most flag that identifies a subsidiary

with no further nodes beneath it, and finally, a top-most flag to indicate

that there are no further nodes above the parent. The sample bridge table rows

corresponding to the hierarchy in Figure 6.8 are shown as in Table 6.4.

When we want to descend the organization hierarchy, we join the tables

together as shown in Figure 6.9. We can now constrain the customer table to a

particular parent customer and request any aggregate measure of all the subsidiaries

at or below that customer. We can use the “# of Levels from Parent”

column in the organization bridge table to control the depth of the analysis.

Constraining to a value of 1 would give all the direct subsidiaries of the customer.

A value greater than zero would give all subsidiary customers but not

the original parent. We can use the “Bottom Flag” column to jump directly to all

the bottom-most customer entities but omit all higher-level customer entities.

If we wanted

to sum up revenue in the fact table for a given customer and all its subsidiaries,

the SQL code would look something like the following:

We can request all the revenue from the organizations associated with many

parents at once, but we have to get the subsidiary keys distinctly or risk double

counting. In the following example we retrieve the January 2002 revenue

from all organizations whose parents are located in San Francisco. The SQL

code is messier, but it works for both unique and multiple parent customers.

Be very careful when simultaneously joining a single dimension table to two fact

tables of different cardinality. In many cases, relational systems will return the

wrong answer. A similar problem arises when joining two fact tables of different

granularity together directly.

Consider the case in which we have a fact table of customer solicitations and

another fact table with the customer responses resulting both from the solicitations

and other independent sources. There is a one-to-many relationship

between customer and solicitation and another one-to-many relationship

between customer and response. The solicitation and response fact tables have

different cardinalities; in other words, not every solicitation results in a

response (unfortunately for the marketing department), and some responses

are received for which there is no solicitation. Simultaneously joining the solicitation

fact table to the customer dimension, which is in turn joined to the

response fact table, does not return the correct answer in a relational DBMS

because of the cardinality differences. Fortunately, this problem is easily

avoided. We simply issue multipass SQL code to query the solicitation and

response tables in separate queries and then outer join the two answer sets.

The multipass approach has additional benefits in terms of better controlling

performance parameters, in addition to supporting queries that combine data

from fact tables in different physical locations.

Free Gyaan

Pages

Monday, 29 December 2014

Chapter 6 : Customer Relationship Management

No comments:

Post a Comment