Free Gyaan: Chapter 4 : Procruments

Chapter 4 : Procruments

Type 1: Overwrite the Value

With the type 1 response, we merely overwrite the old attribute value in the

dimension row, replacing it with the current value. In so doing, the attribute

always reflects the most recent assignment.

The type 1 response is easy to implement, but it does not maintain any history of

prior attribute values.

Before we leave the topic of type 1 changes, there’s one more easily overlooked

catch that you should be aware of. When we used a type 1 response to deal

with the relocation of IntelliKidz, any preexisting aggregations based on the

department value will need to be rebuilt. The aggregated data must continue

to tie to the detailed atomic data, where it now appears that IntelliKidz has

always rolled up into the Strategy department.

Type 2: Add a Dimension Row

We made the claim earlier in this book that one of the primary goals of the data

warehouse was to represent prior history correctly. A type 2 response is the

predominant technique for supporting this requirement when it comes to

slowly changing dimensions.

Using the type 2 approach, when IntelliKidz’s department changed, we issue

a new product dimension row for IntelliKidz to reflect the new department

attribute value. We then would have two product dimension rows for IntelliKidz,

such as the following:

Now we see why the product dimension key can’t be the SKU number natural

key. We need two different product surrogate keys for the same SKU or physical

barcode. Each of the separate surrogate keys identifies a unique product

attribute profile that was true for a span of time. With type 2 changes, the fact

table is again untouched. We don’t go back to the historical fact table rows to

modify the product key.

If we constrain only on the department attribute, then we very precisely differentiate

between the two product profiles. If we constrain only on the product

description, that is, IntelliKidz 1.0, then the query automatically will fetch

both IntelliKidz product dimension rows and automatically join to the fact

table for the complete product history. If we need to count the number of products

correctly, then we would just use the SKU natural key attribute as the

basis of the distinct count rather than the surrogate key. The natural key field

becomes a kind of reliable glue that holds the separate type 2 records for a single

product together. Alternatively, a most recent row indicator might be

another useful dimension attribute to allow users to quickly constrain their

query to only the current profiles.

Since the type 2 technique spawns new dimension rows, one downside of this

approach is accelerated dimension table growth. Hence it may be an inappropriate

technique for dimension tables that already exceed a million rows.

Type 3: Add a Dimension Column

In our software example, let’s assume that there is a legitimate business need

to track both the old and new values of the department attribute both forward

and backward for a period of time around the change. With a type 3 response,

we do not issue a new dimension row, but rather we add a new column to capture

the attribute change. In the case of IntelliKidz, we alter the product

dimension table to add a prior department attribute. We populate this new column

with the existing department value (Education). We then treat the department

attribute as a type 1 response, where we overwrite to reflect the current

value (Strategy). All existing reports and queries switch over to the new

department description immediately, but we can still report on the old department

value by querying on the prior department attribute.

The type 3 slowly changing dimension technique allows us to see new and historical

fact data by either the new or prior attribute values

Atype 3 response is inappropriate if you want to track the impact of numerous

intermediate attribute values.

Hybrid Slowly Changing Dimension Techniques

Each sales rep dimension row would include all prior district assignments.

The business user could choose to roll up the sales facts with any of the five

district maps. If a sales rep were hired in 2000, the dimension attributes for

1998 and 1999 would contain values along the lines of “Not Applicable.”

We label the most recent assignment as “Current District.” This attribute will

be used most frequently; we don’t want to modify our existing queries and

reports to accommodate next year’s change. When the districts are redrawn

next, we’d alter the table to add a district 2002 attribute. We’d populate this

column with the current district values and then overwrite the current

attribute with the 2003 district assignments.

In this manner we’re able to use the historical attribute to segment history and

see facts according to the departmental roll-up at that point in time. Meanwhile,

the current attribute rolls up all the historical fact data for product keys

12345 and 25984 into the current department assignment. If IntelliKidz were

then moved into the Critical Thinking software department, our product table

would look like the following:

With this hybrid approach, we issue a new row to capture the change (type

2) and add a new column to track the current assignment (type 3), where

subsequent changes are handled as a type 1 response. Someone once suggested

that we refer to this combo approach as type 6 (2 + 3 + 1). This technique

allows us to track the historical changes accurately while also

supporting the ability to roll up history based on the current assignments.

We could further embellish (and complicate) this strategy by supporting

additional static department roll-up structures, in addition to the current

department, as separate attributes.

More Rapidly Changing Dimensions

In this chapter we’ve focused on the typically rather slow, evolutionary

changes to our dimension tables. What happens, however, when the rate of

change speeds up? If a dimension attribute changes monthly, then we’re no

longer dealing with a slowly changing dimension that can be handled reasonably

with the techniques just discussed. One powerful approach for handling

more rapidly changing dimensions is to break off these rapidly changing

attributes into one or more separate dimensions. In our fact table we would

then have two foreign keys—one for the primary dimension table and another

for the rapidly changing attribute(s). These dimension tables would be associated

with one another every time we put a row in the fact table.

Free Gyaan

Pages

Monday, 29 December 2014

Chapter 4 : Procruments

No comments:

Post a Comment