Free Gyaan: Chapter 2 : Retail

Chapter 2 : Retail

Four-Step Dimensional Design Process

1. Select the business process to model.

2. Declare the grain of the business process.

3. Choose the dimensions that apply to each fact table row

4. Identify the numeric facts that will populate each fact table row

Step 2. Declare the Grain

Once the business process has been identified, the data warehouse team faces

a serious decision about the granularity. What level of data detail should be

made available in the dimensional model? This brings us to an important

design tip.

Preferably you should develop dimensional models for the most atomic information

captured by a business process. Atomic data is the most detailed information collected;

such data cannot be subdivided further.

Why date dimentions are needed

Some designers pause at this point to ask why an explicit date dimension table

is needed. They reason that if the date key in the fact table is a date-type field,

then any SQL query can directly constrain on the fact table date key and use

natural SQL date semantics to filter on month or year while avoiding a supposedly

expensive join. This reasoning falls apart for several reasons. First of

all, if our relational database can’t handle an efficient join to the date dimension

table, we’re already in deep trouble. Most database optimizers are quite

efficient at resolving dimensional queries; it is not necessary to avoid joins like

the plague. Also, on the performance front, most databases don’t index SQL

date calculations, so queries constraining on an SQL-calculated field wouldn’t

take advantage of an index.

In terms of usability, the typical business user is not versed in SQL date semantics,

so he or she would be unable to directly leverage inherent capabilities

associated with a date data type. SQL date functions do not support filtering

by attributes such as weekdays versus weekends, holidays, fiscal periods, seasons,

or major events. Presuming that the business needs to slice data by these

nonstandard date attributes, then an explicit date dimension table is essential.

At the bottom line, calendar logic belongs in a dimension table, not in the

application code. Finally, we’re going to suggest that the date key is an integer

rather than a date data type anyway. An SQL-based date key typically is 8 bytes,

so you’re wasting 4 bytes in the fact table for every date key in every row. More

will be said on this later in this chapter.

Casual Dimension

Promotion Dimension

The promotion dimension is potentially the most interesting dimension in our

schema. The promotion dimension describes the promotion conditions under

which a product was sold. Promotion conditions include temporary price

reductions, end-aisle displays, newspaper ads, and coupons. This dimension

is often called a causal dimension (as opposed to a casual dimension) because

it describes factors thought to cause a change in product sales.

Null Values

Typically, many sales transaction line items involve products that are not being

promoted. We will need to include a row in the promotion dimension, with its

own unique key, to identify “No Promotion in Effect” and avoid a null promotion

key in the fact table. Referential integrity is violated if we put a null in a

fact table column declared as a foreign key to a dimension table. In addition to

the referential integrity alarms, null keys are the source of great confusion to

our users because they can’t join on null keys.

You must avoid null keys in the fact table. A proper design includes a row in the

corresponding dimension table to identify that the dimension is not applicable

to the measurement

NonAdditive Facts

Percentages and ratios, such as gross margin, are nonadditive. The numerator and

denominator should be stored in the fact table. The ratio can be calculated in a data

access tool for any slice of the fact table by remembering to calculate the ratio of

the sums, not the sum of the ratios

Unit price is also a nonadditive fact. Attempting to sum up unit price across any

of the dimensions results in a meaningless, nonsensical number

Product Dimension

An important function of the product master is to hold the many descriptive

attributes of each SKU. The merchandise hierarchy is an important group of

attributes. Typically, individual SKUs roll up to brands. Brands roll up to

categories, and categories roll up to departments. Each of these is a many-toone

relationship. This merchandise hierarchy and additional attributes are

detailed for a subset of products in Figure 2.6.

Drill Down

A reasonable product dimension table would have 50 or more descriptive

attributes. Each attribute is a rich source for constraining and constructing row

headers. Viewed in this manner, we see that drilling down is nothing more

than asking for a row header that provides more information. Let’s say we

have a simple report where we’ve summarized the sales dollar amount and

quantity by department.

If we want to drill down, we can drag virtually any other attribute, such as

brand, from the product dimension into the report next to department, and we

automatically drill down to this next level of detail. Atypical drill down within

the merchandise hierarchy would look like this:

We have belabored the examples of drilling down in order to make a point,

which we will express as a design principle.

Drilling down in a data mart is nothing more than adding row headers from the

dimension tables. Drilling up is removing row headers. We can drill down or up on

attributes from more than one explicit hierarchy and with attributes that are part of

no hierarchy.

Degenerate dimension

Although the POS transaction number looks like a dimension key in the fact

table, we have stripped off all the descriptive items that might otherwise fall in

a POS transaction dimension. Since the resulting dimension is empty, we refer

to the POS transaction number as a degenerate dimension (identified by the DD

notation in Figure 2.10). The natural operational ticket number, such as the

POS transaction number, sits by itself in the fact table without joining to a

dimension table. Degenerate dimensions are very common when the grain of

a fact table represents a single transaction or transaction line item because the

degenerate dimension represents the unique identifier of the parent. Order

numbers, invoice numbers, and bill-of-lading numbers almost always appear

as degenerate dimensions in a dimensional model.

Operational control numbers such as order numbers, invoice numbers, and bill-oflading

numbers usually give rise to empty dimensions and are represented as degenerate

dimensions (that is, dimension keys without corresponding dimension tables)

in fact tables where the grain of the table is the document itself or a line item in the

document.

Factless Fact table

Promotion Coverage Factless Fact Table

Regardless of the handling of the promotion dimension, there is one important

question that cannot be answered by our retail sales schema: What products

were on promotion but did not sell? The sales fact table only records the SKUs

actually sold. There are no fact table rows with zero facts for SKUs that didn’t

sell because doing so would enlarge the fact table enormously. In the relational

world, a second promotion coverage or event fact table is needed to help

answer the question concerning what didn’t happen. The promotion coverage

fact table keys would be date, product, store, and promotion in our case study.

This obviously looks similar to the sales fact table we just designed; however,

the grain would be significantly different. In the case of the promotion coverage

fact table, we’d load one row in the fact table for each product on promotion

in a store each day (or week, since many retail promotions are a week in

duration) regardless of whether the product sold or not. The coverage fact

table allows us to see the relationship between the keys as defined by a promotion,

independent of other events, such as actual product sales. We refer to

it as a factless fact table because it has no measurement metrics; it merely captures

the relationship between the involved keys. To determine what products

where on promotion but didn’t sell requires a two-step process. First, we’d

query the promotion coverage table to determine the universe of products that

were on promotion on a given day. We’d then determine what products sold

from the POS sales fact table. The answer to our original question is the set difference

between these two lists of products.

Snlowflaking

Dimension table normalization typically is referred to as snowflaking

The dimension tables should remain as flat tables physically. Normalized,

snowflaked dimension tables penalize cross-attribute browsing and prohibit the use

of bit-mapped indexes. Disk space savings gained by normalizing the dimension tables

typically are less than 1 percent of the total disk space needed for the overall

schema. We knowingly sacrifice this dimension table space in the spirit of performance

and ease-of-use advantages.

Centipede fact

Most business processes can be represented with less than 15 dimensions in

the fact table. If our design has 25 or more dimensions, we should look for

ways to combine correlated dimensions into a single dimension. Perfectly correlated

attributes, such as the levels of a hierarchy, as well as attributes with a

reasonable statistical correlation, should be part of the same dimension. You

have made a good decision to combine dimensions when the resulting new

single dimension is noticeably smaller than the Cartesian product of the separate

dimensions.

Free Gyaan

Pages

Monday, 29 December 2014

Chapter 2 : Retail

No comments:

Post a Comment